Sequencing Nucleic Acid Polymers with Electron Microscopy

ABSTRACT

This invention relates to using an electron microscope to sequence by direct inspection of labeled, stretched DNA. This method will have higher accuracy, lower cost, and longer read length than current DNA sequencing methods.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Application No.60/997,427, filed Oct. 4, 2007, and U.S. Provisional Application No.61/132,960, filed Jun. 23, 2008, the disclosures of which are expresslyincorporated herein by reference in their entirety.

FIELD OF THE INVENTION

The invention relates to methods of sequencing nucleic acids.

BACKGROUND OF THE INVENTION

Current DNA sequencing is done mostly by Sanger methods and othersequencing-by-synthesis methods. These methods suffer from high cost,short read lengths, and insufficient throughput.

Sequencing by electron microscopy has also been explored. The idea ofsequencing by electron microscopy is not new. It was proposed by RichardFeynman only six years after the structure of DNA was discovered.However, it has never been successfully used to generate meaningfulsequence information.

The transmission electron microscope (TEM) works by sending an electronbeam through a sample and onto a detector or screen. Portions in thesample impede or deflect the beam, so that the pattern of electronsreaching the detector forms an image. In most situations, atoms of lowatomic number (Z) produce very little contrast and are essentiallyinvisible in the electron microscope. Ordinary DNA, comprising low-Zhydrogen, carbon, nitrogen, oxygen, and phosphorus atoms shows almost nocontrast in an electron microscope and is almost impossible to seeagainst a supporting background. To visualize DNA using current electronmicroscopy techniques, the bases may be labeled with high-Z atoms orotherwise rendered detectable by TEM.

The first serious work on sequencing by electron microscopy was done byBeer, M. and Moudrianakis, E. N., 48(3) PNAS 409-416 (1962). His initialwork focused on heavy atom labels for DNA and also attempting tovisualize heavy atoms in the electron microscope. Later work, andprobably the best other work to date, was done by Whiting andOttensmeyer. It was reported by Ottensmeyer:

-   -   Heavy atom markers for thymine and adenine and guanine were        developed and tried on model sequences, only to indicate that        although the electron microscope and the chemistry were no        longer major obstacles, specimen preparation was. The        uncontrollable placing of a marked single-stranded nucleic acid        polymer on the specimen support resulted in a rather non-uniform        base-to-base spacing. Therefore an easy and accurate reading        from a single molecule was not possible.” Ottensmeyer, F. P.        (1979). “Molecular Structure Determination by High Resolution        Electron Microscopy.” Ann. Rev. Biophys. Bioeng. 9129: 129-144.        (Internal references omitted.)

There is great biomedical importance to having the ability to rapidlysequence individual genomes and other sequences, and as described above,improved methods are needed.

BRIEF SUMMARY OF THE INVENTION

The invention provides methods for using an electron microscope tosequence by direct inspection of labeled, stretched nucleic acid, suchas DNA. The methods, devices, and compositions of the invention allowcontrollable placement of a nucleic acid on a substrate, so that thereis consistent base-to-base spacing, allowing for accurate nucleic acidsequencing information to be obtained using electron microscopy. Theinvention may be implemented in a number of ways.

According to one aspect of the invention, a method for obtainingsequence information of a DNA polymer strand may include providing asolution comprising a DNA polymer strand, where the DNA polymer strandhas been treated such that a plurality of DNA bases have been labeledwith an contrast agent with base specificity or base selectivity,introducing a DNA binding tool into the solution and binding a sectionof the DNA polymer onto the tool, removing the tool from the solution,stretching the labeled DNA polymer strand into space such that thelabeled DNA polymer strand is suspended between an air/solvent interfaceand the tool, depositing the stretched labeled DNA polymer strand onto asubstrate, imaging the labeled DNA d polymer strand using electronmicroscopy such that positions of labeled and unlabeled bases aredetermined, and correlating the positions of labeled and unlabeled baseswith the sequence of the DNA polymer.

The method may further include denaturing the DNA strand to generatesingle stranded DNA prior to labeling with the contrast agent. Thedenaturing step is carried out by thermal denaturation or chemicaldenaturation. The method may also include depositing a layer of carbonon top of the labeled DNA polymer strand attached to the substrate priorto said imaging step. The method may also include labeling multiplestrands from a DNA polymer sample with different labels and compilingthe data obtained from imaging the multiple strands of the DNA polymerto obtain the complete sequence information.

The DNA may be high molecular weight DNA having greater than about 100kb. The stretching step may result in consistent base-to-base spacing inthe DNA polymer strand. The base-to-base spacing between the bases maybe is in a range of about 3 Å to about 7 Å, and specifically be about 5Å. The DNA polymer strand may be stretched to a length of at least about2 μm.

Only a subset of bases of the DNA polymer strand may be labeled. Thesubset of bases comprises thymines and cytosines. The subset of basescomprises adenine and guanine.

The tool may be a needle such that a tip of the needle has beenfunctionalized with a first coating that binds to a DNA polymer strand.The needle may be fabricated from glass, gold, tungsten, PMMA,polystyrene, PVC, or silicon. The coating may be a compound such asPMMA, polystyrene, PVB, and oligonucleotides. The needle may befunctionalized with a second coating that does not bind to nucleicacids. The second coating may be at least one of octanethiol,hexanethiol, nonanethiol, decanethiol, and septanethiol. The tip of theneedle has a radius of curvature that is less than about 200 nm. Theneedle may be moved into and out of the solution at a rate in a range ofabout 1 nanometers/second to about 100 meters/second. Specifically, theneedle may be moved into the solution at a depth having a range of about1 Å to about 20 μm.

The contrast agent may be a high-Z atom labeling compound. The high-Zatom may be Os-bipy, mercuric acetate, and platinum dimethylsulfoxide.The contrast agent may be a high-Z atom cluster label.

The DNA polymer strand may be attached to a substrate by employing shelfthreading. The DNA polymer strand may be attached to either a supportsubstrate or an imaging substrate. The DNA polymer strand may beattached to a support substrate by employing shelf threading followed bytransferring the labeled DNA polymer strand on the support substrate toan imaging substrate by employing transfer printing. The DNA polymerstrand may be attached to a support substrate by employing gap threadingfollowing by transferring the labeled DNA polymer strand on the supportsubstrate to an imaging substrate by employing swipe printing. The DNApolymer strands may be a plurality of DNA polymer strands and may bepositioned as an array of parallel strands on a substrate. The stretchedlabeled DNA polymer strand may be single stranded.

According to one aspect of the invention, a method for obtaining anucleic acid sequence information may include determining by electronmicroscopy the positions of labeled and unlabeled bases of a nucleicacid strand within a region of at least 1000 contiguous bases of each ofa plurality of labeled nucleic acid strand are determined. Specifically,the positions of at 1000 contiguous bases of a nucleic acid strand maybe determined, and more specifically, the positions of at 10,000contiguous bases of a nucleic acid strand may be determined. Thepositions of labeled and unlabeled bases of at least about 20 individualstrands may be determined, the positions of at least about 100 multiplestrands may be determined, the positions of at least about 5,000multiple strands may be determined, and the positions of at least about10,000 multiple strands may be determined. The positions may bedetermined for differently labeled nucleic acid polymer strands from thesame sample. The positions may be determined for differently labelednucleic acid polymer strands of at least 200 contiguous bases, of atleast 500 contiguous bases, of at least 1000 contiguous bases, or of atleast 10,000 contiguous bases.

The nucleic acid sequence may be obtained at a rate at least 1,000bases/second. The nucleic acid strand may have a length of at leastabout 100 μm when extended. The nucleic acid may have base-to-basespacing in a range of 3 Å to about 7 Å between the bases, andspecifically, about 5 Å. The nucleic acid sequence may be a DNAsequence.

According to another aspect of the invention, an article of manufacturemay include a liquid having a plurality of nucleic acid polymer strands,a tool, and a single nucleic acid polymer strand having a first end inthe liquid and a second end attached to the tool, where at least aportion of the single nucleic acid polymer strand is suspended in spacebetween the tool and the liquid. The bases of the nucleic acid polymerstrand may be extended such that there is consistent base-to-basespacing of the bases. The base-to-base spacing may be in a range of 3 Åto about 7 Å between the bases. The nucleic acid polymer strand may beextended such that the strands are linear.

According to yet another aspect of the invention, an article ofmanufacture may include a solid planar substrate, and at least oneelongated nucleic acid polymer strand disposed on the planar substrate.The least one elongated nucleic acid polymer strand may have consistentbase-to-base spacing over a length of about 1000 base pairs. The articleof manufacture may further include a film disposed on top of the atleast one elongated nucleic acid polymer such that the at least oneelongated nucleic acid polymer is sandwiched between the planarsubstrate and the film. The film may be composed of a carbon or lowZ-element. The planar substrate may be composed of a material such asPDMS, carbon, boron, lithium, hydrogen, beryllium, aluminum, nitrides,nitride oxides, and combinations thereof.

The base-to-base spacing of the at least one elongated nucleic acidstrand may be in a range of 3 Å to about 7 Å. The plurality of elongatednucleic acid strand may be substantially parallel to one another. Theplurality of elongated nucleic acid strand may include about 1×10⁶nucleic acid strands. The at least one elongated nucleic acid strand maybe stretched to a length of at least about 2 μm. The at least onenucleic acid polymer acid strand may be labeled with at least oneZ-labeling compound. The Z-labeling compound may be a high-Z atomlabeling compound such as Os-bipy, mercuric acetate, platinumdimethylsulfoxide, or cluster labeling.

According to a further aspect of the invention, a method for sequencingat least 200 contiguous bases of a nucleic acid strand using electronmicroscopy may include labeling a plurality of nucleic acid strands withat least one Z-labeling compound, binding a single labeled nucleic acidstrand from a solution containing a plurality of labeled nucleic acidstrands onto a tool, stretching the single labeled nucleic acid strandinto space such that the single labeled strand of nucleic acid issuspended between an air/solvent interface and a tip of the tool,attaching the stretched labeled nucleic acid to a substrate, and imagingthe labeled nucleic acid strand using electron microscopy.

The method may further include pretreating a solution containing aplurality of nucleic acid strands with bisulfite to convert unmethylatedcytosine bases to uracil bases prior to the labeling step. The methodmay also include denaturing a plurality of nucleic acid strands togenerate single stranded nucleic acids prior to the labeling step. Thedenaturing step may be carried out by thermal denaturation or chemicaldenaturation. The method may also include depositing a layer of carbonon top of the labeled nucleic acid strand attached to the substrateprior to the imaging step.

The nucleic acid may be high molecular weight DNA. The stretching stepresults in consistent base-to-base spacing in the nucleic acid strand.The base-to-base spacing and the label-to-label spacing may be in arange of about 3 Å to about 7 Å, and specifically may be 5 Å. Thenucleic acid strand may be stretched to a length of at least about 25μm. Only a subset of bases of the nucleic acid strand are labeled. Thesubset of bases may include thymines and cytosines. The subset of basesmay include adenine and guanine.

The tool may be a needle such that a tip of the needle has beenfunctionalized with a first coating that binds to a nucleic acid strand.The needle may be fabricated from a material such as glass, gold,tungsten, polymethyl methylacrylate (PMMA), polystyrene, PVC, andsilicon. The coating may be a compound such as PMMA, polystyrene, PVB,and oligonucleotides. The needle may be functionalized with a secondcoating that does not bind to nucleic acids. The second coating may be acompound such as octanethiol, hexanethiol, nonanethiol, decanethiol, andseptanethiol.

The tip of the needle may have a diameter that is less than about 200nm. The needle may be moved into and out of the solution at a rate in arange of about 1 nanometers/second to about 100 meters/second. Theneedle may be moved into the solution at a depth having a range of about0 nm to about 20 μm.

The at least one labeling compound may be a high-Z atom labelingcompound. The high-Z atom may be one or more compounds such as Os-bipy,mercuric acetate, and platinum dimethylsulfoxide. The at least onelabeling compound may be a high-Z atom cluster label.

The attaching step may include attaching at least one labeled nucleicacid strand to an imaging substrate by employing shelf threading. Theattaching step may include attaching at least one labeled nucleic acidstrand to a support substrate by employing shelf threading followed bytransferring the at least one labeled nucleic acid strand on the supportsubstrate to an imaging substrate by employing transfer printing. Theattaching step may include attaching at least one labeled nucleic acidstrand to a support substrate by employing gap threading. The attachingstep may include attaching at least one labeled nucleic acid strand to asupport substrate by employing gap threading following by transferringthe at least one labeled nucleic acid strand on the support substrate toan imaging substrate by employing swipe printing.

According to an even further aspect of the invention, a method for thecontrolled placement of at least one nucleic acid strand onto asubstrate may include providing a solution containing a plurality ofnucleic acid strands, inserting a tip of a needle into the solution,pulling the tip of the needle out of the solution containing a pluralityof nucleic acid strands, where the tip of the needle has beenfunctionalized with a first coating that binds to nucleic acids,stretching the nucleic acid strand into empty space such that the singlestrand of nucleic acid is suspended between an air/solvent interface andeach tip of the needle, and attaching at least one stretched nucleicacid strand to a substrate. The nucleic acid is high molecular weightDNA.

The needle may be fabricated from a material such as glass, gold,tungsten, PMMA, polystyrene, PVC, and silicon. The first coating may bea compound such as PMMA, polystyrene, PVB, and oligonucleotides. Theneedle may be functionalized with a second coating that does not bind tonucleic acids. The second coating may be a compound such as octanethiol,hexanethiol, nonanethiol, decanethiol, and septanethiol.

The tip of the needle may have a radius of curvature that is less thanabout 200 nm. The tip of the needle may be moved into and pulled out ofthe solution at a rate of about 1 nm/s to about 100 m/s. The tip of theneedle may be moved into the solution at a depth having a range of about10 nm to about 20 μm. The needle may be placed on a singlenanopositioner-driven support. A plurality of needles may be insertedinto the solution simultaneously with the nanopositioner driven support.

The stretching step may result in consistent base-to-base spacing in thenucleic acid strand. The base-to-base spacing may be in a range of about3 Å to about 7 Å. The base-to-base spacing and the label-to-labelspacing may be about 5 Å. The nucleic acid may be stretched to a lengthof about 100 μm.

The at least two nucleic acid strands attached to the substrate areoriented substantially parallel to each other. The attaching step mayinclude employing shelf threading to attach the at least one nucleicacid strand to the substrate. The attaching step may include using gapthreading to attach the at least one nucleic acid strand to thesubstrate. The substrate may be an imaging substrate. The substrate insaid attaching step may be a support substrate.

According to another aspect of the invention, a method for analyzing anucleic acid sequence stored in a memory, where the sequence wasdetermined by labeling a plurality of nucleic acid strands with at leastone labeling compound, binding a single labeled nucleic acid strand froma solution containing a plurality of labeled nucleic acid strands onto atool, stretching the single labeled nucleic acid strand into space suchthat the single labeled strand of nucleic acid is suspended between anair/solvent interface and a tip of the tool, attaching the stretchedlabeled nucleic acid to a substrate, and imaging the labeled nucleicacid strand using electron microscopy. The nucleic acid sequence is agenomic sequence of a human subject.

The analyzing may include determining at least one of the presence orabsence of one or more single nucleotide polymorphisms, copy number,variants, indels, rearrangements, or whole genome comparisons. Thememory is a media selected from the group consisting of hard or floppydisks, optical media, compact disc (CD), digital video disc (DVD),semiconductor media, and flash memory.

According to a further aspect of the invention, a needle, may include adistal end, a proximal end, and a shaft extending between and in fluidcommunication with said distal proximal end and said proximal end, wherethe proximal end includes a tip member having a radius of curvature lessthan about 200 nm and where the tip member has been functionalized witha compound that binds to the end of a nucleic acid. The needle may becoated with a second compound that does not have an affinity for bindingto a nucleic acid. The tip member may be functionalized with a compoundsuch as PMMA, polystyrene, PVB, silanization, oligonucleotides,telomeres, and restriction site overhangs. The nucleic acid may besingle stranded DNA. The needle may be composed of a compound such asglass, gold, tungsten, PMMA, polystyrene, PVC, and silicon. The needlemay be disposed on a single nanopositioner-driven support.

Additional features, advantages, and embodiments of the invention may beset forth or apparent from consideration of the following detaileddescription, drawings, and claims. Moreover, it is to be understood thatboth the foregoing summary of the invention and the following detaileddescription are exemplary and intended to provide further explanationwithout limiting the scope of the invention as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are included to provide a furtherunderstanding of the invention, are incorporated in and constitute apart of this specification, illustrate embodiments of the invention, andtogether with the detailed description serve to explain the principlesof the invention. No attempt is made to show structural details of theinvention in more detail than may be necessary for a fundamentalunderstanding of the invention and various ways in which it may bepracticed.

FIG. 1 is a flow chart illustrating a method for sequencing a nucleicacid accurately using electron microscopy according to principles of theinvention.

FIG. 2 is schematic showing a needle functionalize with coating 1, whichis capable of binding to a nucleic acid strand and coating 2 which doesnot bind to a nucleic acid strand.

FIG. 3 shows a schematic of the radius of curvature of the sharp needle.

FIG. 4 is a schematic illustration showing a method according toprinciples of the invention for extending a nucleic acid strand intoempty space. Panel I shows a tip of the needle and a droplet of solutioncontaining a plurality of nucleic acid strands. Panel II show the tip ofthe needle moving into the droplet of solution containing the nucleicacid strands and binding to a single nucleic acid strand. Panel IIIshows the nucleic acid strand being stretched out into empty space.

FIG. 5 is a schematic showing the maximum dipping depth of the needletip into the droplet of solution containing the nucleic acid strands(Panel I) and the minimum dipping depth (Panel II).

FIG. 6 is a schematic showing the minimum dipping depth of the needletip into the droplet of solution containing nucleic acid strands. Theexpanded view schematically shows a needle tip in a droplet of solutionspecifically binding to a single nucleic acid strand, which may be insolution or at the atmosphere/solution interface.

FIG. 7 is a schematic showing the method of shelf threading according toprinciples of the invention. Panel I shows the sharp needle dipping intothe solution containing the labeled nucleic acid strands. Panel II showsthe sharp needle withdrawing from the solution and stretching theattached nucleic acid out into empty space. Panel III show the extendednucleic acid coming into contact with the TEM grid. Panels IV and V,show the sharp needle pulling back to release the nucleic acid from thetip of the sharp needle.

FIG. 8 is schematic representation of the shelf threading method of theinvention. Panel I shows shelf threading employing a single needle andPanel II shows shelf threading employing a plurality of needles.

FIG. 9 is a schematic showing that the extended nucleic acid strand isoriented normal to the droplet surface.

FIG. 10 is a schematic illustrating that the nucleic acid extended inempty space is brought substantially into contact along its length withthe support substrate when the strand is placed upon it.

FIG. 11 is a schematic showing elongation of a nucleic acid strand byproximal set down followed by more stretching.

FIG. 12 is a schematic showing large parallel arrays of closely spacednucleic acid strands, such as DNA strands, that can be formed byrepeating the basic programmed piezo-actuator-controlled needle motionorder of dipping-in, dipping-out, setting-down, dragging, lifting-up,and translating. The nucleic acid strands in this figure are notdepicted as straight as the strands would be in actual practice.

FIG. 13 is a schematic showing a failure mode where improperconsideration of solution surface/support substrate/needle motion angleswill induce uncontrolled contact between the support substrate and thesuspended strand prior to needle-substrate contact, which in turn willcause strand breakage through overstretching

FIG. 14 is a schematic showing a failure mode where wrongly calibratedsolution surface/support substrate/needle motion angles will not allowthe strand to be brought substantially in contact with the substrate,leaving a significant portion of it suspended in empty space between thesolution surface and the point of needle-substrate contact.

FIG. 15 is a schematic showing the gap threading method according toprinciples of the invention.

FIG. 16 is a schematic illustrating the transfer printing methodaccording to principles of the invention following deposition of thenucleic acids onto the support substrate.

FIG. 17 is a schematic showing the nucleic acid strands on the substrateembedded in top layer prior to imagining by electron microscopy.

FIG. 18 is a schematic illustrating a system for sequencing a nucleicacid according to principles of the invention.

FIG. 19 is a schematic illustrating a hypothetical system for sequencinga nucleic acid according to principles of the invention; a simulatedimage of a single osmium-labeled molecule of ssDNA generated usingprinciples of the invention.

FIG. 20 is a schematic illustrating the ambiguity in images inherent inalternative preparation methods.

FIG. 21 is a illustration showing a simple mode droplet holder.

DETAILED DESCRIPTION OF THE INVENTION

It is understood that the invention is not limited to the particularmethodology, protocols, and reagents, etc., described herein, as thesemay vary as the skilled artisan will recognize. It is also to beunderstood that the terminology used herein is used for the purpose ofdescribing particular embodiments only, and is not intended to limit thescope of the invention. It also is to be noted that, as used herein andin the appended claims, the singular forms “a,” “an,” and “the” includethe plural reference unless the context clearly dictates otherwise.Thus, for example, a reference to “a molecule” is a reference to one ormore molecule and equivalents thereof known to those skilled in the art.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meanings as commonly understood by one of ordinary skillin the art to which the invention pertains. The embodiments of theinvention and the various features and advantageous details thereof areexplained more fully with reference to the non-limiting embodimentsand/or illustrated in the accompanying drawings and detailed in thefollowing description. It should be noted that the features illustratedin the drawings are not necessarily drawn to scale, and features of oneembodiment may be employed with other embodiments as the skilled artisanwould recognize, even if not explicitly stated herein.

Any numerical values recited herein include all values from the lowervalue to the upper value in increments of one unit provided that thereis a separation of at least two units between any lower value and anyhigher value. As an example, if it is stated that the concentration of acomponent or value of a process variable such as, for example, size,angle size, pressure, time and the like, is, for example, from 1 to 90,specifically from 20 to 80, more specifically from 30 to 70, it isintended that values such as 15 to 85, 22 to 68, 43 to 51, 30 to 32,etc. are expressly enumerated in this specification. For values whichare less than one, one unit is considered to be 0.0001, 0.001, 0.01 or0.1 as appropriate. These are only examples of what is specificallyintended and all possible combinations of numerical values between thelowest value and the highest value enumerated are to be considered to beexpressly stated in this application in a similar manner.

Moreover, provided immediately below is a “Definition” section, wherecertain terms related to the invention are defined specifically.Particular methods, devices, and materials are described, although anymethods and materials similar or equivalent to those described hereincan be used in the practice or testing of the invention. All referencesreferred to herein are incorporated by reference herein in theirentirety.

Definitions

A is Adenine

C is Cytosine

G is Guanine

T is Thymine

U is Uracil

ssDNA is single stranded DNA

AFM is Atomic Force Microscope

CCD is Charge Coupled Device

CMOS is Complementary Metal Oxide Semiconductor

DMSO is Dimethyl Sulfoxide

EDTA is Ethylenediaminetetraacetic acid

HAADF is High Angle Annular Dark Field

IMPREST is Individual Molecule Placement Rapid Empty Space Threading

PFGE is pulsed field gel electrophoresis PFGE

Os-bipy is Osmium tetroxide 2,2′-bipyridine

PDMS is Polydimethylsiloxane

PLD-UHV is Pulsed Laser Deposition-Ultra High Vacuum

PMMA is Polymethyl methylacrylate

PMT is Photo Multiplier Tube

PVB is Polyvinyl butyral

SEM is Scanning Electron Microscopy

STEM is Scanning Transmission Electron Microscopy

TE is Tris EDTA

TEM is Transmission Electron Microscopy

UHV is Ultra High Vacuum

The term “Z,” as used herein refers to the number of protons in thenucleus of an atom, also known as atomic number. “High-Z” refers to anatomic number greater than then imaging thin-film, but for practicalsequencing means higher than about 30, or preferably higher than about70, or more preferably higher than about 90.

The term “consistent,” as used herein, generally means that the spacingbetween the bases of the stretched nucleic acid strand is the relativelythe same throughout the length of the stretched nucleic acid strand. Thespacing “between” bases is generally the distance from center to center,or phosphate to phosphate. Bases are consistently spaced when in anelectron microscopic image of the strand the order of labeled andunlabeled bases can be determined over a specified length, e.g., about50 bases, about 100 bases, about 1000 bases, or about 10,000 bases.

The term “contiguous,” as used herein, generally means that that thebases in a nucleic acid strand are all within a common boundary and aregenerally connecting without a break.

The term “consecutive,” as used herein, generally means that the basesin the nucleic acid strand follow one another in uninterruptedsuccession or order.

The phrases “plurality of strands,” “plurality of nucleic acid strands,”“plurality of DNA strands,” and the like refers to 2 strands, 5 strands,10 strands, 100 strands, 1000 strands, and so on.

The term “support substrate,” as used herein refers to any matrix towhich the extended and/or stretched nucleic acid polymers can adhere.

The term “imaging substrate,” as used herein refers to the substratethat will be used directly for electron microscopy imaging. The imagingsubstrate may be placed into the microscope and, optionally, support theimaging thin-film. The imaging substrate may include holey or laceyformvar mesh, other polymer meshes, thin silicon nitride filmscontaining holes, and other suitable types of grids. The imagingsubstrate is generally thicker than the imaging thin-film but is neededto support the delicate imaging thin-film. The imaging substrate as usedherein, may refer to the imaging thin-film and the support structureholding it or only the support substrate. See M. Hayat, Principles andTechniques of Electron Microscopy: Biological Applications 4^(th)edition, which describes general methods in electron microscopy.

The term “imaging thin-film,” as used herein refers to a thin layer ofcarbon, boron, lithium, hydrogen, beryllium, aluminum, or other lowZ-elements and/or nitrides and oxides thereof, and any combinationthereof. The layer may have a thickness in a range of about 0.2 nm toabout 30 nm. The imaging thin-film may be supported on an imagingsubstrate or other surface.

The term “substantially parallel” generally refers to the geometricalconcept of two straight lines never meeting. As used herein,substantially parallel refers to extended nucleic acid polymers that donot meet or cross over the area where sequence data is to be determined.

The term “nucleic acid,” as used herein, includes oligonucleotides andpolynucleotides, and to DNA or RNA of genomic, recombinant or syntheticorigin which may be single- or double-stranded, and represent the senseor antisense strands, or to any DNA-like or RNA-like material, natural,recombinant, or synthetic in origin.

The term “complementary” as used herein, includes the natural hydrogenbonding of polynucleotides under permissive salt and temperatureconditions by base-pairing. For example, the sequence “A-G-T” binds tothe complementary sequence “T-C-A.”

Complementarity between two single-stranded molecules may be partial, inwhich only some of the nucleic acids bind, or it may be complete whentotal complementarity exists between the single stranded molecules. Thedegree of complementarity between nucleic acid strands has significanteffects of the efficiency and strength of hybridization between nucleicacid strands.

The term “sample,” as used herein refers to biological material thatcontains nucleic acids, such as tissue or fluid from a human or animalincluding, but not limited to, plasma, serum, spinal fluid, lymph fluid,the external sections of the skin, respiratory, intestinal andgenitourinary tracts, tears, saliva, blood cells, tumors, organs,tissues; as well as samples from plants, fungi, bacteria, pathogens andin vitro cell cultures. A sample may be obtained from any species thatcontain nucleic acids, phylogenetically encompassing all viruses,prokaryotes, and eukaryotes. A sample may also include nucleic acidsthat have been artificially synthesized using techniques known in theart such as solid-phase synthesis or synthesized in vitro using, forexample, PCR.

The term “about,” as used herein, is used to describe a range of values,applies to both the upper limit and lower limit of the range. Forexample, the phrase “ranges from about 10 to 100” has the same meaningas “ranges from about 10 to about 100.” Moreover, when referring todistance, the term “about” generally means +/−10%. For example, thephrase “about 5 nm” means 5 nm+/−10%.

The terms “contrast agent,” “label,” or “labeling compound,” as usedherein generally refers to an atom, molecule, cluster, or material thathas a higher atomic number (Z) and/or density and/or differentialelectron scattering than the imaging thin-film material and unlabeledDNA. The “contrast agent,” “label,” or “labeling compound” may be acompound of different contrast than the bases of the nucleic acid stranditself and is attached to the nucleic acid strand.

The term “cluster,” as used herein, generally refers to chemicalstructure comprising two or more high-Z atoms, which is attached to anucleic acid strand either base-selectively or base-specifically.

Overview

The invention generally relates to methods, devices and articles ofmanufacture for determining nucleic acid sequences using electronmicroscopy by direct inspection of labeled, stretched nucleic acids. Ina particular embodiment, the invention relates to methods includingcontrolled placement of a nucleic acid onto a substrate or support usinga tool to pull out single strands of nucleic acid from a solution. Themethods of the invention allow for greater accuracy, lower cost, andlonger read lengths than current sequencing technology. For example, thesequencing methods of the invention allow accurate determination of atleast about 20 consecutive nucleic acid bases using electron microscopy,preferably at least about 50 consecutive bases, more preferably at leastabout 1,000 consecutive bases, even more preferably at least about10,000 consecutive bases, and even more preferably at least about100,000 consecutive bases and even more preferably at least about1,000,000 bases of a nucleic acid sample. “Consecutive bases” in thiscontext refers the order of bases in the DNA starting material that isanalyzed.

Using the methods of the invention it is also possible to generatesequence more rapidly than possible using synthetic methods. Methods ofthe invention (combined with high speed EM) may allow for imaging atleast about 10,000 bases/second, and preferably at least about 100,000bases/second, and more preferable at least about 200,000 bases persecond. For example, using TEM imaging, DNA strands arrayed according tothe invention sample may be imaged at a high resolution rate of about 1μm² per second. A 1 μm² area containing nucleic acid strands may beimaged in 1 second, correspond to an imaging rate of about 500,000 basesper second.

FIG. 1 is a flow chart illustrating a method for sequencing a nucleicacid accurately using electron microscopy according to principles of theinvention. This figure is provided for illustration and is not intendedto limit the invention. In step 102, a nucleic-acid containing sample isobtained from a subject containing a nucleic acid sequence(s) ofinterest. In step 104, using techniques known to those of skill in theart, the nucleic acid of interest is isolated from the sample. In step106, specific bases of the isolated nucleic acid are labeled with, forexample, high-Z atoms to generate a high-Z atom labeled nucleic acidpolymer. In step 108, the nucleic acid polymer is stretched into emptyspace to ensure consistent base to base spacing of nucleotides. In step110, nucleic acid polymers are attached to a support and laid out in anon-overlapping pattern (e.g., substantially parallel relative to eachother). In step 112, the attached nucleic acid polymers are imaged byelectron microscopy to determine the position of label along thepolymer. The image is captured by a detector (e.g., CCD or CMOS camera,or PMT) and positions of the label are recorded, typically on a computerreadable medium. In step 114, an algorithm is employed to use thespacing information to determine the base sequence information from thenucleic acid polymer. These steps are described in greater detail below.

Nucleic Acid Preparation

In step 104, the nucleic acid of interest may be isolated using methodswell known in the art, with the choice of a specific method depending onthe source, nature of nucleic acid, and similar factors. The nucleicacid of interest may be naturally occurring and/or of genomic origin,not of synthetic or recombinant origin, and may include oligonucleotidesor polynucleotides either double stranded or single stranded form.Alternatively, the nucleic acid strand of interest may be of recombinantor synthetic origin, which may be single stranded or doubled stranded.In one specific embodiment, the nucleic acid is DNA, and in particular,a very high molecular weight DNA having greater than about 100kilobases. In some embodiments the high molecular weight DNA is at least300 kilobases in length. Methods for isolation of very high molecularweight are known (see. e.g., Murry and Thompson, 1980, Nucleic AcidsResearch 10:4321-5; and Kovacic, R., ET AL., 1995, 23(19) NUC. ACIDSRES. 23(19) 3999-4000).

According to one method, very high molecular weight DNA is isolated fromeukaryotic cells embedded in agarose plugs to minimize shearing. The DNAis separated from other cellular components by PFGE and subsequentlyelectro-eluted from the agarose into TE buffer at a concentration in arange of about 0.01 ng/μl to about 0.5 ng/μl. The DNA may be denaturedinto single stranded form from double stranded form using thermal orchemical denaturation methods known to those of skill in the art. SeeBarnes, W., 91 PNAS 2216-2220 (1994). For example, thermal denaturationof double stranded DNA may be carried out by heating the DNA sample to94° C. for 2 minutes. The denaturation step may take place beforelabeling and may take place before threading the nucleic acid. It isdesirable to convert dsDNA into ssDNA prior to labeling, if sequencingis carried out using ssDNA, however, it is not necessary and dsDNA maybe labeled as understood by a skilled artisan.

Nucleic Acid Labeling

In step 106, specific bases of the nucleic acid are labeled withcontrast agents, such as high-Z atoms, for efficient detection byelectron microscopy. The nucleic acid should be associated with electrondense atoms in a manner that is at least partially base specific. Acontrast agent or label that is partially base specific (i.e., “baseselective”) will preferentially associate with one or more of the fourDNA or RNA bases over the others (e.g., stains A strongly, G lessstrongly, T not at all). A contrast agent is completely base specific ifit associates essentially with only one base (e.g., A) or sequence(e.g., a particular dimeric sequence). A considerable number of methodsfor labeling DNA are known and can be used in the invention. Forexample, base-specific and/or base selective heavy metal stainingprotocols are described by Whiting ET AL., 474 BIOCHIMICA ET BIOPHYSICAACTA 334-348 (1977), Jelen ET AL., 10 GEN. PHYSIOL. BIOPHY. 461-473(1991), and Dale ET AL., 14(11) BIOCHEMISTRY 2447-2457 (1975), all ofwhich are herein expressly incorporated by reference in their entirety.The invention should not be construed to be limited to any particularmethod of nucleic acid labeling and as appreciated by those skilled inthe art, many methods have been described in the scientific literature.

Depending on the method used, each nucleic acid base may be exclusivelylabeled with a different high-Z labeling compound, selected subsets ofnucleic acid bases may be labeled with a different high-Z labelingcompounds, or selected subsets of nucleic acid bases may be labeled withthe same high-Z labeling compound, such as Os-bipy (as described below).According to one specific embodiment, each DNA base, for instance A, G,C, and T, is exclusively or preferentially labeled with a differentZ-labeling compound. A large variety of high-Z labeling compounds may beemployed in the methodologies of the invention including, but withoutlimitation, compounds that contain Pt, Hg, I, Rh, Au, Ir, Ag, Os, andthe like. Different bases may be distinguished, for example, based ondifferences in the high-Z labeling agents or between high-Z agents andunlabelled bases, which should be nearly invisible. For example, Os-bipyand iodine can be distinguished based on their scatteringcross-sections. Also, for example, a single Au atom bound to all A basescan be distinguished from a three Au cluster bound to all G bases.

Because, in the method of the invention, the TEM imaging detectsnumerous non-overlapping DNA stands with consistent base spacing, thefidelity of labeling is not critical. Thus, one advantage of the methodsof the invention is that the nucleic acid can be labeled chemically andthat for a given reaction batch, it is not necessary that all targetbases in each strand be labeled (for example, it is not necessary thateach T in a strand be labeled when T-specific labeling is used).Complete label accuracy is not necessary because multiple images of thesame strand can be combined to determine the underlying sequence becausethe number of unlabeled bases between the label bases can be determined.For illustration, consider a specific genomic sequence that isrepresented twenty times in a batch of DNA (i.e., twenty molecules allsharing a region common to that position are present in the batch). Forexample, the batch may be adjusted to contain about 20 genomeequivalents of total DNA. Assume that the base identity of a specificposition in the sequence is T, all twenty molecules contain a T at thatposition. The batch of DNA containing these molecules is subjected to aparticular reaction condition known to label any given T between 90% and100% of the time and to label any given A, C, or G between 0% and 10% ofthe time. Upon imaging, that position may be seen as labeled in thiscase in nineteen of the molecules, and unlabeled in one of themolecules. Similarly, an “A” may be labeled in one of the molecules, andunlabeled in nineteen of the molecules. A probabilistic treatment willassign the correct identity to that position in the genome with extremeaccuracy. In this case, the identity of the position labeled in 19/20molecules is “T” and the identity of the position labeled in 1/20molecules is “not T.” Furthermore, alignment and joining of sequencescan be effected by probabilistic treatments known to those skilled inthe art.

Another advantage of the methods of the invention is that sequenceinformation for complementary strands can be derived, which alsoprovides additional statistical support for the validity of a given basedetermination, i.e., high confidence in the positions of T on onemolecule and high confidence in the positions of A on the complementarystrand go hand-in-hand. Thus, for example, it is possible to confidentlydetermine positions of all A and T in a sequence from acompound/reaction condition batch that only allows for gooddiscrimination of T in terms of labeling chemistry. The labeled Ts oneach strand define A positions on the complementary strand.

Therefore, in some embodiments, only a subset of nucleic acid bases arelabeled. For example, aliquots of DNA are separately labeled in a mannerthat is at least partially base specific. In one approach, a solutioncontaining a plurality of nucleic acid molecules is reacted underconditions that label at least about 70%, sometimes at least about 80%and sometimes at least about 90% to about 100% of one or more specificnucleotide bases (A, T, G or C) and less than 20, preferably less than10% of at least one nucleotide base. For example, the solution isreacted under conditions where about 90% to about 100% of T and C arelabeled while labelling a small percentage of A and G.

In one embodiment, the DNA aliquots may be labeled with Os-bipy usingdifferent conditions in order to achieve different base-specificlabelling densities. See Example 1, infra. In this approach, the DNA ofinterest is isolated and divided into two solutions, i.e., solutions 1and 2 at a concentration in a range of about 0.01 ng/μl to about 1 ng/μlin each solution. Each solution is reacted with Os-bipy using differentconditions (as described in further detailed below) in order to achievedifferent base-specific labelling densities. Furthermore, selectedsolutions are subjected to a pre-treatment prior to reacting withOs-bipy, such as a bisulfite pre-treatment, as described below and inExample 1, infra.

Solution 1 is reacted for 20 hours at 26° C. with a four-fold molarexcess of Osmium tetroxide and of 2,2′-bipyridine in TE buffer pH 8.0with 100 mM Tris and 10 mM EDTA; these conditions label about 100% ofT's, about 85% of C's, about 7% of G's, and about 0% of A's. Solution 2is reacted under the same conditions as Solution 1 except that thereaction only proceeds for 15 minutes, and only a 2.5-fold molar excessof Osmium tetroxide and of Os-bipy is used; these conditions label about90% of T's, about 8% of C's, about 5% of G's, and about 0% of A's. Asdescribed below, using the methods of the invention, it is possible tocompare these results to those obtained for a short incubation timereaction, and thereby determine the base-specific pattern for C, and byextension for G on the complementary strand. By extension, it is alsopossible using the methods of the invention to determine thebase-specific pattern for T and by extension for A on the complementarystrand. Thus, a single labeling compound with two reaction conditionscan be used for determination of the pattern of T, A, G, and C.

It is also to be appreciated that the invention allows for determinationof patterns of cytosine methylation using Os-bipy. In this case fouraliquots (solutions 1, 2, 3 and 4) are used. Solutions 1 and 2 aretreated as was Solution 1, above. Solutions 3 and 4 are treated as wasSolution 2, above. However, prior to Os-bipy reactions, Solutions 2 and4 are first subjected to a bisulfite treatment to convert unmethylated Cresidues to U. This allows the pattern of methylation to be determinedby comparing sequences from solutions treated with bisulfite to thoseleft untreated. See Jelen ET AL. 10 Gen. PHYSIO. AND BIOPHYS 461-473(1991). Both methylcytosine and U have labeling efficiencies underdifferent conditions in the reaction with Os-bipy that aredistinguishable from labeling efficiencies for the canonical four bases.Thus, in order to derive epigenetic modification information for agenomic sample, it is possible either to determine the base-specificpattern of methylcytosine in the context of DNA that has not beentreated with bisulfite or the pattern of U in the context of sequencingafter bisulfite treatment. After the labeling reaction, unlabeled osmiumis removed by ultrafiltration or dialysis to minimize extraneous heavyatom contamination during the imaging process and the DNA polymers arediluted to about 0.1 ng/μl in TE buffer pH 8.

Many other high atom base-specific high atom labeling methods are knownto those skilled in the art. Some examples are as follow: Beer andMoudrianakis described the use of a diazonium salt compound coupled touranyl ions for the labeling of guanine residues (Beer, M. andMoudrianakis, E., Determination of Base Sequence in Nucleic Acids withthe Electron Microscope 48(3) PNAS, 409-416 (1962)). Robert Whiting usedPt-DMSO [Platinum Dimethylsulfoxide, KPtCl3(DMSO)] to achievedifferential labeling suitable for the identification of adeninenucleotides (Studies of Nucleic Acid Sequences by Dark Field ElectronMicroscopy, Ph.D. Thesis, (1975), University of Toronto, School ofGraduate Studies). Mecuration of cytosine residues using mercuricacetate was reported by Dale and coworkers (Direct covalent mercurationof nucleotides and polynucleotides, Dale, R M K, ET AL., 14BIOCHEMISTRY, 2447-2457 (1975)). Another approach to labeling DNA isfirst to modify it covalently in a base-specific manner so that itaccepts heavy metal labels at the modified bases. One example of thisapproach was described by Seth Rose, who modified adenine and cytosineresidues with chloroacetylaldehyde so that mercuric acetate or osmiumtetroxide could subsequently bind these residues (Rose, S D.,Mercuration of Modified Nucleotides: Chemical Methods Toward NucleicAcid Sequencing by Electron Microscopy, 361 BIOCHIM. BIOPHYS. ACT.,231-235 (1974)).

In a further embodiment, cluster labeling may be employed in the methodsof the invention. Cluster labels are label compounds that contain morethan a single heavy atom. Cluster labels may be used in protocols thatutilize stretching methods that provide sufficient base to baseseparation. Sufficient separation is necessary in order to obtainsequence data that is not limited by steric hindrance betweenneighboring attached clusters. Clusters can be attached to oligomers andthen the oligomers are hybridized to DNA through complementary basepairing. In this manner, complementary sequences could be localizedusing electron microscopy. In order to efficiently achieve full sequencedata, information can be combined from imaging of separate batches ofvery short cluster-labeled oligomers (trimers and tetramers) hybridizedto unknown sequences. In particular, the very short cluster-labeledoligomers may have a length in a range of about 5 to about 20 bases.Clusters that can directly label unmodified DNA, i.e., DNA that has thenatural composition of bases without unnatural bases containingfunctional groups that may increase the efficiency of cluster labeling,include the triosmium compound as described in Rosenberg ET AL., 689 J.ORGANOMETAL. CHEM. 4729-4738 (2004). Cluster labeling of DNA strands inwhich specific modified nucleotides have functional groups such asaminoallyl or thiol groups that would allow efficient reaction withcommercially available cluster label reagents such as monomaleimidoundecagold from Nanoprobes Inc. (Yaphank, N.Y.). Commercially availablecluster labeling reagents that can be functionalized with linkers thatreact with DNA in a base-specific or base selective manner includetriosmium dodecacarbonyl, triuthenium dodecacarbonyl, and tetrairidiumdodecacarbonyl (all three available from Sigma Aldrich Corp., St. Louis,Mo.) and monomaleimido undecagold (Nanoprobes, supra). Cluster labelingwith sterically hindered clusters can be made more efficient byperforming the labeling reactions on DNA that is stretched in solution.Labeling may also be performed inside a fine tube (about 30 nm to about1 μm in diameter) where the DNA is elongated within the tube orcapillary with known techniques (Chan, E., and Goncalves, N., 14(6)GENOME RES. 1137-1146 (2004)), in solution, and reacts with a label.This method has the advantage of preventing cross-linking from labelsthat have more than one binding site (e.g., Nanogold).

In one embodiment, cluster labels may be employed as contrast agents,known clusters may be attached to nucleic acid polymers in abase-specific or base-selective manner using chemical linkage structuresknown to bind or modify nucleic acid polymers base-selectively orbase-specifically. This approach to cluster labeling may be referred toas “piggybacking.” Piggybacking may be carried out in the followingmanner, for example. Mercuric acetate may be used to mecurate cytosine,as described in Dale, R., ET AL., 14 BIOCHEMISTRY 2447-2457 (1975), anda cluster compound that will attach to cytosine may be prepared with amercuric acetate moiety by acetylating the mercury-bridged triosmiumcluster (mu3-eta2-c2-t-Bu)Os3(CO)9(mu-Hg)I, which is described inRosenberg, E., ET AL., 10 ORGANOMETALLICS 203-210(1991). As anotherexample of piggybacking, phenanthroline is known to form a complex withosmium tetroxide which can be used as a base-selective label. (Paecek,E., ET AL., J. 13(3) BIOMOL. STRUCT. DYN. 537-46 (1995)). 5-aminophenanthroline (polysciences, Inc., Warrington Pa.) can be attached viaits exocyclic amine group to the succinimdyl ester of diphenylphosphinopropionic acid (Argus Chemicals SRL, Vernio, Italy). The phosphine canbe ligated to any of several cluster compounds by techniques known tothose skilled in the art (Cheng ET AL., 127 J. STRUCT. BIO. 169-76(1999). As still another example of piggybacking, cluster compounds canbe derivatized with alkylating moieties, such as sulfonate esters.Relevant synthetic procedures are described in Susan Ermer, ET AL, J.ORGANOMET. CHEM., (1980), 187, 81-90. The use of sulfonate esters toalkylate nucleic acid polymers is described in Yi-Zhang ET AL., 32(31)BIOCHEMISTRY 7954-7965 (1993).

The invention provides methods for obtaining sequence information of anucleic acid polymer by determining the positional sequence of selectedbases in a specified region of a nucleic acid (e.g., DNA) strand. Bysequence information is meant that the position of one or more nucleicacid bases of both labeled and unlabeled bases are known and bypositional sequence is meant that the positions of at least one base(e.g., T) relative to other bases is determined. For example, in a 25base DNA strand in which Ts are labeled, the positional sequence withina 25 base region may be described as follows:

-   -   T000T0000TTT00T00T000T00T        where “0” is a base other than T. Thus, detecting the position        of Ts and non-Ts allows one to determine the positional sequence        of Ts. As will be understood from this description, the        positional sequences of T and/or C and/or G and/or A or        combinations thereof can be determined. The method of the        invention provides a method for determining the positional        sequence of at least one base in a single nucleic acid strand        with at least 70% accuracy, alternatively at least 80% accuracy,        and often at least 90% accuracy. The positional sequence may be        determined in a region comprising at least 100 up to one million        bases, sometimes 200 to one million bases, sometimes 1000 to one        million bases, sometimes 10,000 to one million bases. In some        embodiments the positional sequence is determined in a region of        at least 200, at least 1000, at least 5000, at least 10,000, at        least 100,000 or at least one million bases of a strand. In one        embodiment the positional sequence of at least one base is        determined for a region comprising 1000 to 100,000 bases. The        accuracy of the method may be determined by re-sequencing DNA of        known sequence.

“Positional sequence” is one type of sequence information. It will beapparent from this disclosure that by comparing positional sequence forindividual bases (or combinations of bases) it is possible to obtainmore complete sequence information, including the positional sequence ofall four bases (i.e., complete sequence) within a region of the strandor genome.

Advantageously, the sequencing method(s) of the invention do not rely onincorporating modified nucleotides into DNA or a nucleic acid strand.Although the method is compatible with, and may be used with,enzymatically incorporated labels (e.g., incorporated duringpolymerization) it is more often used with naturally occurring DNAstrands that isolated and labeled directly (using, for example, labelsdescribed above and in the literature). Moreover, the present methodallows (but does not require) sequence to be determined for a singlestranded DNA rather than a double stranded molecule, thus eliminatingambiguities that may arise with other approaches.

Nucleic Acid Suspension

In Step 108, individual nucleic acid polymers are stretched into emptyto space to ensure consistent base to base spacing within the nucleicacid strand. According to one embodiment, individual DNA strands areextended into space (i.e., a substantial portion of the length of thestrand is not supported by a substrate or suspended in a solution orbuffer). The suspended DNA essentially free from solution or buffer canthen be transferred to an imaging substrate for imaging. This processmay be referred to as DNA threading, discussed in detail below.

Suspension of DNA out into empty space results in consistentbase-to-base spacing of the nucleic acid polymer. Conceptually this isanalogous to grabbing both ends of a spring with two hands andstretching, such that each loop of the spring is the same distance fromthe next loop because they all experience the same amount of force. Thiscauses the heavy atom labels that are seen in the electron microscope tobe spaced in a manner corresponding to their actual spacing along thenucleic acid polymer and to the spacing of the specific bases to whichthey are attached. Moreover, the positions of unlabeled bases can bedetermined based on the consistent spacing.

In a specific embodiment, the DNA is suspended using a tool to which anend of a DNA strand is attached. The tool may be dipped into a solution(typically a droplette) containing a plurality of nucleic acid polymerstrands or a single strand, a nucleic acid strand may preferentiallybind to the tip of the tool, and as the tool is pulled out of solution,the nucleic acid molecule is suspended in space such that a first end ofthe nucleic acid strand is in the solution, a second end of the nucleicacid is attached to the tool, and a region between the ends (“suspendedregion”) is suspended in space. It will be understood that the “end” ofthe DNA molecule is not necessarily defined by the physical termini ofthe molecule in solution (e.g., 5′ phosphate and 3′ hydroxyl groups).For example, a 140 kb molecule might have 20 kb at one end in thedroplet, 100 kb in the suspended region and 20 kb at the other end boundto the needle. The nucleic acid strand is suspended such that the basesof the nucleic acid polymer strand are extended such that there isconsistent base-to-base spacing of the bases. Specifically, thebase-to-base spacing or periodicity is in a range of about 3 Å to about7 Å between the bases, which may be measured from center to center ofeach phosphate.

In one embodiment, the droplette may have a volume in a range of about0.5 μl to about 50 μl, sometimes a volume in a range of about 1 μl toabout 25 μl, sometimes a volume in a range of about 1 μl to about 15 μl,sometime a volume a volume in a range of about 1 μl to about 10 μl, andsometimes a volume in a range of about 1 μl to about 5 μl.

In one embodiment, at least a portion of the single nucleic acid polymerstrand may be suspended in space between the tool and the liquid. Thebases of the nucleic acid polymer strand may be extended such that thereis consistent base-to-base spacing of the bases. The base-to-basespacing may be in a range of 3 Å to about 7 Å between the bases, andspecifically about 5 Å. The nucleic acid polymer strand may be extendedsuch that the strands are linear.

The tool used to extract the DNA molecule into empty space (e.g., out ofa droplette) may be any of a variety of devices so long as it can beused to bind a single nucleic acid strand and suspend it into space. Insome embodiments, the tool is a sharp needle, a hollow needle, or small(i.e., less than about 300 nm in diameter) magnetic particle that has anaffinity to bind nucleic acids used with a magnetic probe.

In one approach the tool is a sharp needle. The sharp needle may becomposed of materials such as glass, gold, tungsten, PMMA, polystyrene,PVC, silicon, or any other suitable substance that may be made into avery sharp needle. The needle tip can be readily made by techniquesknown to those of ordinary skill in the art, such as using a standardpipette puller, microfabrication (Handbook of Microlithography,Micromachining & Microfabrication, P. Rai-Choudhury, SPIE OptiaclEngineering Press, 1997), growth, or molding and casting. For example, aglass needle can be made by heating the middle of a glass fiber (aboutthe same diameter as a micro-capillary tube) in an ethanol flame andpulling from either end of the fiber. Alternatively, a standard pipettepuller can be used. In one embodiment. as shown in FIG. 2, the needle200 may have a proximal end 202 including a tip 206 having a diameterless than about 200 nm, a distal end 204, and a shaft 208 extendingbetween the proximal end and distal end. The tip 206 may have a diameterless than, up to about or greater than about 1 μm. The tip diameter canbe determined using SEM. The terminal radius of curvature would bemeasured in a geometrically consistent manner, e.g., one would identifya rounded area at the tip and would interpose an imaginary circle fittedto the curvature, and one would then measure the radius of that circleby measuring the distance from its center to its periphery. In somecases the rounded arc at the end of the needle will only roughlyapproximate the outer edge of a circle (FIG. 3). In other cases the endof the needle is broken to resemble a flat mesa about 20 nm to about 500nm across. A preferred diameter may be less than about 300 nm.

The tip of the needle may be functionalized by coating with a materialthat preferentially binds to the end of a nucleic acid, and may includewithout limitation, PMMA, polystyrene, PVB, chemical treatments such assilanization, oligo- or polynucleotides complementary to genomicsequences or restriction site overhangs (the oligomers may possessdegenerately pairing bases such as inosine, allowing for greaterselective range), aptamers with high affinity to specific sequences orstructures, streptavidin or other proteins with a high affinity to amolecule such a biotin, or any other suitable material that wouldspecifically attach to the ends of the nucleic acid. In one specificembodiment, the needle tip may be coated with PMMA by dipping in about0.5% PMMA solution in acetone, and drying in an acetone saturatedatmosphere.

In a further embodiment, the tip of the needle may be coated with asecond coating, to limit the area where the nucleic acid can bind, andmay include materials such as octanethiol, nonanethiol, hexadecanethiol,or other linear alkane-thiol chains. In FIG. 2, coating 1 binds to theends of the DNA, while coating 2 does not bind or binds less avidly. Forexample, in one approach, a gold needle tip may be dipped into asolution of polystyrene, and then the polystyrene may be crosslinkedjust at the very tip in an electron beam. Subsequently, theun-crosslinked polystyrene may be removed from the rest of the needle bymethods known to those skilled in the art such as dipping in acetone orchloroform. The needle could then be dipped into an octanethiolsolution, which will form monolayers on gold but not on the polystyrene.The ends of the DNA will not bind to the octanethiol region.

In another embodiment, magnetic nanoparticle oligonucleotides may beused to specifically label the ends of the nucleic acid polymer strands.In this approach a tool having a magnetized end is dipped into thesolution containing the magnetic labeled nucleic acid polymer strandsand binds to the end of a single molecule of the nanoparticle labelednucleic acid polymer strand. As the magnetized tool is pulled out of thesolution, the nucleic acid molecule is extended and suspended in spacesuch that a first end in of the nucleic acid strand in is the solutionand a second end of the nucleic acid is attached to the tool. Thenucleic acid strand is suspended such that the bases of the nucleic acidpolymer strand are extended such that there is consistent base-to-basespacing of the bases, and specifically, the base-to-base spacing is in arange of about 3 Å to about 7 Å between the bases, and in particular 5Å.

In one embodiment, the sharp tool may be a hollow needle. In particular,hollow micro-needles may be manufactured with techniques known in theart, so that a nucleic acid solution may be pumped through the bore.Subsequently, the micro-needles may be touched to either a supportsubstrate with an affinity for nucleic acid ends or to a sharperpolymer-coated solid needle in order to thread directly out of thehollow bore. Thus, the hollow micro-needles could serve as both directthreading implements and as channels for precise solution control. Thebore may be fabricated by techniques known in the art that only onestrand would enter the bore-length-wise, and assist in controllingnucleic acid thread concentration. This is technique using a hollowneedle is consistent with pulling DNA out of solution using a sharptool. The difference with using the hollow needle is that the surfaceand shape of the solution happens to be very small and constrained bythe walls of the hollow needle.

After the sharp needle is functionalized, the nucleic acid is “threaded”or attached onto the sharp needle. In one embodiment, DNA threading isperformed by dipping the sharp functionalized needle into and out of theDNA polymer solution, pulling the DNA strands into empty space as shownin FIG. 4. The needle tip may be moved into and out of solution at arate in a range of about 1 nm/hr to about 10 m/s, and specifically, at arate in the range of 1 μm/s to about 10 mm/s. It is appreciated that theDNA strands pulled out remain normal to the surface of the DNA solution.Nanopositioners including piezo-actuators, such as those used in AFM,may be used to control the position and motion of the needle withsub-angstrom precision (using high quality feedback mechanisms).Nanopositioners are known in the art and are described, for example, inU.S. Pat. No. 5,903,085 and are commercially available from PiezosystemJena (Germany). Variables that affect efficient threading of singlestrands include solution temperature and pH, humidity of surroundingatmosphere, the concentration of labeled ssDNA in solution, the speedthe needle is dipped into and out of the solution, the length of timethe needle sits in solution, the sharpness of the needle and coating,and the depth and angle of entry as the needle is dipped into solution.

The “dipping in” distance is a variable that may control the amount ofDNA that attaches to the needle (FIG. 5). For example, for highmolecular weight DNA having a concentration in a range of about 0.01ng/μl to about 0.5 ng/μl, the needle dipping may be a depth in a rangeof about 1 Å to about 20 μm into the solution. In one embodiment,shallow dipping is used (FIG. 6). With shallow dipping, the needle isplaced into the solution in a very short distance (for example, lessthan about 1 μm) amount to limit the surface area of the needleavailable for strand binding. Multiple needles and/or an automatedsystem can be used for threading the needle.

Another variable is the amount of time the needle stays in solution. Thenumber of nucleic acid molecules that attach to the tip may be regulatedby controlling the amount of time the tip is allowed to remain insolution. For a given solution, longer needle dwell times will result inmore attachments, and shorter times will result in fewer attachments, asa function of the time required for a molecule to diffuse in solutionand attach to the needle. This time range can vary from less than about1 ns, to multiple seconds or minutes.

For example, a large number of ultra-sharp needles could be placed on asingle nanopositioner-driven support with close spacing so that a singledipping motion could pull a large number of strands simultaneously, withenough separation to avoid interference between strands. A parallel or“bed” of needles could also be made by microfabrication techniques knownby those of skill in the art.

Once the nucleic acid is attached onto the needle, the needle is pulledout of the solution, and thus the nucleic acid bases are extended outinto empty space to ensure consistent spacing between the bases. Theextended nucleic acid should have a base-to base spacing in a range ofabout 3 Å to about 7 Å, and specifically about 5 Å. The spacing of thebases, however, should not be construed to be limited exclusively tothis range, as the appropriate spacing of bases will depend on a numberof factors, such as whether the nucleic acid is stretched further by useof a shelf (described below). The forces acting between the air-waterinterface and the needle tip alone should lead to a base-to base spacingof about 5 Å or a force of about 65 pN. The total length of a particularextended strand in empty space will correspond to the number of bases inthe strand times the average base-to-base spacing. For example, a strandabout 10 million bases long may be stretched to a length of about 5 mmgiven a 5 Å base-to-base spacing. The nucleic acid strand may bestretched to a length of about 20 nm, sometimes to a length of about 100nm, sometimes to a length of about 200 nm, and sometimes to a length ofabout 1 μm. It should be noted, that the distribution of lengths in aparticular pool of nucleic acid may have to be taken into consideration.This is so that strands are not completely pulled out of solution sothat meniscus forces at the air-solution interface can continue to applytension at the end of a strand distal to the needle tip. With a solutioncontaining a longer length nucleic acid polymers, the nucleic acidstrands will be pulled greater distances from the solution relative to asolution containing shorter length nucleic acid polymers. Sufficientextension to get consistent base-to-base spacing is a linear function ofthe molecular weight of the nucleic acid molecules. In specificembodiments, the nucleic acid strand may be stretched and extended to alength of at least about 2 μm, more specifically, to a length of atleast about 25 μm, even more specifically to a length of at least about50 μm, and more specifically yet, to a length of at least about 100 μm.

Since the forces involved in stretching the nucleic acid do not breakcovalent bonds, the nucleic acid will not break during the stretchingprocess. The meniscus forces acting at the air-water interface arestrong enough to remove the secondary structure, but not strong enoughto break the covalent bonds. If the threading device is arranged andprogrammed such that additional force is applied beyond that required topull the DNA out of solution care must be taken not to break the DNA.

It is desirable to minimize evaporation and/or control the position andangle of the droplet in many embodiments. In the simplest mode, thedroplet is held stationary with the careful positioning andconfiguration of a PDMS holder piece. In FIG. 21, panel I shows theapproach of a droplet to the grid holder piece. Panel II shows thedroplet held between two pieces of PDMS such that the droplet willmaintain the threading surface front. Even as the droplet evaporates thethreading surface front will stayed unchanged, while only the recedingsurface front moves Panel III. In another simple mode, a humid chamberis constructed around the threading apparatus such that a state of high(e.g., approaching 100%) humidity is achieved and the droplet will notevaporate.

Alternatively, the nucleic acid strand may be removed from solution andsuspended between two points by attaching the nucleic acid strand to twopoints in solution using optical or magnetic beads (Bustamante, C., ETAL., 421(6921) NATURE 423-427 (2003)). The strand is then freeze driedand the ice is then sublimed away, leaving a suspended single strandthat can then be transferred to an imaging substrate. In an otherembodiment, individual nucleic acid strands are bound to a singlemagnetic particle, which is then withdrawn from the droplette (e.g.,using a magnetic probe) to extend the DNA strand into space.

Attachment to Imaging Substrate

In step 110, once the nucleic acid strands are threaded between theneedle and the droplet in empty space and the nucleic acid strand is notsurrounded by either solution or buffer, the strands can be placeddirectly onto a substrate, such as a support substrate or an imagingsubstrate for imaging by electron microscopy. One approach for doingthis is referred to as “shelf threading,” in which extended DNA strandare placed on an imaging substrate or are placed on a support substrateand transferred to an imaging substrate in a separate step. See FIG. 7.A related approach is referred to as “gap threading” in which DNAstrands are suspended across a gap in a support substrate andtransferred to an imaging substrate in a separate step. See FIG. 15.

The strands can be placed in a great many orientations includingorientations in two and three dimensions so long as strands do notoverlap or cross in the “suspended region” (or do so rarely). The mostconvenient and standard method for EM sample preparation is twodimensional, typically with a number of strands positioned parallel toeach other. See, e.g., FIG. 8. In this configuration, the ratio ofsample to empty or ‘non-sample’ space can be readily maximized. Ingeneral, the closer the interstrand spacing the better, as long as thestrands are not in such close proximity that they interfere with thevisibility and identification of their neighbors. Sub-nanometerprecision positioners are commercially available, and a very convenientconfiguration is to place the strands in parallel lines ranging fromabout 2 nm to about 10 nm apart. In an other embodiment the strands arepositioned radially.

In one embodiment, multiple suspended strands may be placed in asubstantially linear orientation. FIG. 12, shows placement of nucleicacid strands 704, such as DNA on a substrate 1202 by needle 706 from adroplet of solution 702 containing nucleic acid strands 704. Largeparallel arrays of closely spaced nucleic acid strands, such as DNAstrands, can be formed by repeating the basic programmednanopositioner-controlled needle motion order of dipping-in,dipping-out, setting-down, and optionally dragging the needle tip alongthe support substrate in order to deposit the strand, lifting-up, andtranslating over a desired distance between strands. In an industrial,high-throughput scale, millions of strands may be placed onto onesupport substrate in this manner, particularly using a large array ofparallel needles.

Because the method provides DNA strands that are straighter (morelinear) than other methods, crowded arrays may be used and accuratesequence determined. Arrays may contain from about 2 to 10 millionsubstantially linear strands and/or parallel strands with a suspensionregion of length in a range of about 1 μm to about 5 mm and spacing in arange of about 1 nm to about 10 nm, resulting in a density in a range ofabout 1 base/nm² to about 1 base/5 nm². Arrays may have, for example,more than 5, more than 10, more than 100, more than 1000, or more than10,000 DNA strands.

Notably, the methods of the invention can produce linear (straight)double stranded and single stranded nucleic acid strands. Linearity canbe described in terms of dimensions of an imaginary box that enclosesthe strand (or linear portion thereof) or more conveniently an imaginaryrectangle that encloses a two dimensional projection of the strand (orthe strand itself) on a supporting substrate. The linear portion of thestrand is usually at least about 2 μm in length, more often at leastabout 5 μm in length, and even more often at least about 10, 20, 30, 50or 100 μm in length. In some cases the linear portion of a strand willbe nearly the entire length of the strand. In one embodiment, therectangle is such that if the smallest possible imaginary rectangle weredrawn to enclose the strand or linear portion such that all portions ofwere inside the rectangle, the rectangle would have a length to widthratio of not less than 100:1, or preferably at least 160:1, orpreferably at least 200:1, more preferably at least 500:1, and even morepreferably at least 2000:1. For example, for a linear portion 5 μm inlength, a box drawn to enclose a linear threaded strand on a flatsurface would be about 30 nm in width (1:160 ratio) or 2.5 nm in width(2000:1 ratio). For a strand that is 10 um long, the bounding box couldbe 10 μm by 62.5 nm making the ratio 160:1. For a linear portion that is3 um long, the bounding box could be 4 nm making a ratio of 500:1. Inthree dimensions the third (depth) dimension of the box would be withina factor of 5 the dimension of the width, preferably within a factor of2, and most preferably about the same as the width. Because the strandspositioned using the methods of the invention are straight (usuallyalong substantially the entire length of the suspended region)neighboring strands in an array may be placed very close together, suchas about 2 nm apart, about 3 nm apart, about 4 nm apart, about 10 nmapart, about 20 nm apart, or about 50 nm apart from the otherneighboring strands. In some embodiments a plurality of neighboringstrands are less than 50 mn apart, preferably less than 20 nm apart,more preferably less than 10 nm apart, or less than 4 nm apart (e.g.,2-50 nm apart, 2-20 nm apart, 2-10 nm apart, 4-50 nm apart, 4-20 nmapart, or 4-10 nm apart). In a specific embodiment, the linear strandsare placed such that they do not cross.

Shelf Threading

In one approach, shelf threading may be employed for placing DNAdirectly onto an imaging substrate. See FIG. 7, Panels I-V for anillustration. The imaging substrate may support an imaging thin-filmthat be composed of a material such as single graphene sheets,ultra-thin films of carbon, beryllium oxide or beryllium nitride, waterice (requiring cryo-electron techniques), and other suitable forms ofsolid low-Z (electron transparent) solids. Since these ultra-thin filmsare very delicate, they are placed on top of or supported by grid ormesh composed of standard formvar lacey film on a TEM grid,microfabricated holes in SiN membranes (DuraSiN by Protochips, Inc.), orsimilarly fabricated holey grid. FIG. 7 shows a droplet of solution 702containing a plurality of labeled nucleic acid strands 704 (only oneshown for clarity), a needle 706, a TEM grid 708 on top of a PDMS block710. As shown in FIG. 7, Panel I, the needle 706 is dipped into thedroplet of solution 702. In Panel II, needle 706 is pulled out ofsolution 702 such that nucleic acid strand 704 is extended out intoempty space. In Panel III, extended nucleic acid strand 704 is broughtinto contact with TEM grid 708. The nucleic acid strand 704 is releasedfrom needle 706, as shown in Panels IV-V.

In another approach to shelf threading, illustrated in FIG. 8, thesupport substrate 710 is placed next to the droplet of nucleic acidpolymer solution 702 so that the nucleic acid strand suspended betweenthe droplet and the sharp needle is brought down to contact the supportsubstrate. Then a plurality of nucleic acid strands are transferred toan imaging substrate or by “transfer printing,” as described below. FIG.8, Panel I, schematically illustrates a single needle depositing nucleicacid strands 704 onto a PDMS support substrate 710 and FIG. 8, Panel IIschematically illustrates a single needle 706 depositing a plurality ofnucleic acid strands 704 onto support film covered, fabricated TEM grid708, one at a time (shown with the needle in two positions of its motiontime-line path).

As illustrated in FIG. 9, the nucleic acid strand is always normal tothe surface of the droplet. The angle of the solution surface relativeto the support substrate and the motion of the needle is controlled suchthat the nucleic acid 704 in empty space is brought substantially intocontact along its length with the substrate 708 when the strand isplaced upon it (FIG. 10). The nucleic acid may be elongated after it hasbeen “pulled our of solution. FIG. 11, Panels I-IV, shows elongation ofa nucleic acid by proximal setdown followed by more stretching. In FIG.11, the nucleic acid has exaggerated looseness to illustrate greaterstretching by mechanical force than pure meniscus forces. “A” and “B”can represent cross-sectional views of a support such as two bars in anEM grid (on a micro scale) or two strips of PDMS (on a macro scale). “A”and “B” can also represent two positions on a planer substrate.

FIGS. 13 and 14 illustrate that several parameters may be considered inextending and placing the DNA strands. In one failure mode, improperconsideration of solution surface/support substrate/needle motion angleswill induce uncontrolled contact between the support substrate and thesuspended strand prior to needle-substrate contact, which in turn willcause strand breakage through overstretching (FIG. 13, Panel II). Inanother failure mode, wrongly calibrated solution surface/supportsubstrate/needle motion angles will not allow the strand to be broughtsubstantially in contact with the substrate, leaving a significantportion of it suspended in empty space between the solution surface andthe point of needle-substrate contact (FIG. 14).

Gap Threading

In another embodiment, gap threading may be employed for placing thenucleic acid strands onto the support substrate. Gap threading may becarried out in the following manner, which is illustrated schematicallyin FIG. 15. FIG. 15 shows a block of PDMS 1502 with gap 1504, nucleicacid strands 1506 spanning across gap 1504, and an imaging substrate—inthis case an ultra-flat silicon grid covered with a carbon film or otherlow Z film 1508. First, the nucleic acid droplet is placed next to a gapin a small block of PDMS. Typically, the PDMS block will have a lengthof about 3 mm and a width of about 3 mm. The nucleic acid is spannedacross the gap by a sharp needle threading as described above. Thenucleic acid is spanned across the gap 1504. The ultra-flat silicon grid1508 covered with a carbon film, holey gold film, or a continuous filmof carbon beryllium, other low z film or imaging thin-film (as definedabove) is placed in contact with the nucleic acid stretched across gap1504, causing the nucleic acid to be transferred to the grid 1508 (alsoreferred to as “swipe printing”). As used herein, the term “swipeprinting,” refers to bringing one or more nucleic acid strands spanninga gap on a support structure into contact with a structure, which may beanother support substrate or an imaging substrate or an imaging supportthin-film. The block of PDMS 1502 with DNA 1506 spanning the gap 1504could also be flipped over and transferred to an imaging support film(not shown) by the method of transfer printing, described below.

Transfer Printing

As discussed above, DNA strands may be placed on a support substrate andsubsequently transferred to the imaging substrate or imaging thin-film.One way to do this is by “transfer printing.” Transfer printing is knownin the filed and is generally described by Nakao, H., ET AL., 125 J. AM.CHEM. SOC. 7162-7163 (2003). Transfer printing may be employed forplacing the nucleic acid strands on a support substrate or an imagingsupport as shown in FIG. 16. FIG. 16 schematically illustrates a blockof PDMS 1602 having a length of 3 mm and a width of 3 mm, a droplet 1604containing nucleic acid 1606, and a needle 1608. The needle 1608 isdipped into the droplet 1604, and a strand of nucleic acid 1606 bindsthe tip of needle 1608. The needle 1608 is pulled out of droplet 1604,stretching the nucleic acid strand out into empty space. The extendednucleic acid strand 1610 is attached to PDMS block 1602. This process isrepeated until a plurality of nucleic acid strands are placed on thePDMS block (Panel 6). Additional care (in the form of greaterinterstrand-spacing) must be taken so that one threaded strand does notinterfere (i.e., touch or cross) with a previously threaded strand. Oncethe desired number of nucleic acid strands have been placed on the PDMSblock 1602, PDMS block 1602 is inverted and placed on an imagingthin-film or other transfer substrate 1612. The PDMS block 1602 isremoved from the film 1612, while the DNA stays behind on the film 1612thereby transferring the plurality of nucleic acid strands to the film1612. The film can be carbon, beryllium, or other low-z supportdeposited on a freshly cleaved salt crystal or mica sheet (not shown)when the nucleic acid is transferred and then put on a TEM gridafterwards for imaging.

Alternatively, the nucleic acid strands may be placed onto a supportsubstrate and transferred onto another support substrate for storage,transport, or for other purposes.

The imaging thin film may be an imaging substrate, which is a thin filmcomposed of, without limitation, carbon, boron, beryllium, aluminum, orother low-Z-elements and/or nitrides and oxides thereof, or imagingthin-film as previously defined. These films may be manufactured byknown techniques, such as deposition on a cleaved salt crystal or mica.In one specific aspect, an ultra-thin (about 1.5 nm) carbon film isemployed. In another aspect, the imaging substrate is an ultra-flatsilicon TEM grid that is covered with a thin (about 1.5 nm thick)supporting carbon film. The imaging thin-film may also be placed on aformvar micro-mesh-coated TEM grid or a machined silicon grid withregular or irregular holes or apertures.

In one embodiment, at least one elongated nucleic acid polymer strandmay be disposed on a planar substrate. The least one elongated nucleicacid polymer strand may have consistent base-to-base spacing over alength of about 1000 base pairs. A film may be disposed on top of the atleast one elongated nucleic acid polymer such that the at least oneelongated nucleic acid polymer is sandwiched between the planarsubstrate and the film. The film may be composed of a carbon or lowZ-element. The planar substrate may be composed of a material such asPDMS, carbon, boron, lithium, hydrogen, beryllium, aluminum, nitrides,nitride oxides, and combinations thereof.

The methodology described above is not limited to nucleic acid polymers,but can be used with a wide variety of other long (unbranched) highmolecular weight molecules. For example, other high weight polymersincluding but not limited to nanotubes (e.g., carbon nitrate, boron,boron nitrides, and the like), amino acid chains, microtubules, actinfilaments, other long linear polymers with repeating units, and otherpolymers may be threaded onto a suitable tool and attached to a suitablesubstrate. In particular, the methods may be used with a linear polymerthat binds differentially at its end (terminus) to the needle or otherbinding tool. In some embodiments the linear molecule may be modified ata terminus or termini so that the end binds preferentially to the tool.For example, the end of the polymer can be complexed to DNA usingtechniques known by those of skill in the art, which may then bindpreferentially at its end to the tool as described above.

Stabilization

The nucleic acid may be damaged by the electron beam generated by theelectron microscope. For this reason, in some embodiments, once placedon the imaging substrate the labeled nucleic acid strands may bestabilized prior to imaging. For example, additional carbon or otherlow-Z-elements or polymers can be placed onto the sample by evaporation,sputtering or direct deposition of a pre-made film. Indirect evaporationof carbon or other low-Z material may be accomplished by ultra-fastPLD-UHV of carbon or beryllium. The presence of this additional layer(i.e., topcoat) increases the stability of the underlying nucleic acid.FIG. 17 schematically illustrates a TEM grid 1702 having a holey mesh1704, a base layer 1706, nucleic acid 1708, and a top layer 1710.

In one embodiment, the nucleic acid is stabilized by evaporating atopcoat onto the nucleic acid that is made by pulsed laser deposition ina gas atmosphere that cools depositing atoms, but has the conditions ofpressure, target to sample distance, pulse length, pulse frequency, andpulse fluence optimized to give a solid homogenous film embedding thelabeled nucleic acid polymers. These conditions will cause depositingatoms to have minimum reaction with each other en route to the substrateleading to a denser film fully embedding the labels. The purpose of atopcoat is to stabilize the label and/or DNA in a manner that allowssequence data to be determined, and prevent motion or damage.

The imaging thin-film must be very thin for good images of single atomsto be taken in the TEM. Films that are made by standard techniques oftenbecome contaminated after exposure to laboratory air. This contaminationbuild up causes the films to become thicker which can prevent imaging ofsingle atoms and clusters. Besides thickening the film, contaminatesfrom the air can cause structural instabilities of the film wheninteracting with the electron beam. Rigorous cleanliness and care of thefilms must be taken to ensure that this buildup does not happen. For allof the methods described herein, all of the steps should preferably bedone in a controlled hydrocarbon-free environment (e.g., pure nitrogen,argon, or other inert gasses). Once the DNA has been suspended and orplaced on an imaging substrate or transfer substrate like PDMS andremoved from solution, all further steps are best continued in acontrolled environment, preferably a clean hydrocarbon-free gas, butmore preferably UHV (10⁻¹⁰ torr). This can include performing threadingin a controlled atmosphere environment, placing the DNA stands on thefilm in UHV, and other cleanliness techniques known to those skilled inthe art of nanotechnology, semiconductor manufacturing etc (Krishnan,S.; Laparra, O “Contamination issues in gas delivery for semiconductorprocessing” Semiconductor Manufacturing, IEEE Transactions on Volume 10,Issue 2, May 1997 Page(s):273-278; William Whyte, Clean room Technology:Fundamentals of Design, Testing and Operation; Dorothy Hoffman, Handbookof Vacuum Science and Technology).

Contamination from the microscope itself is also a key factor that canlimit visibility of single atoms. Care must be taken to have a veryclean, dry system for imaging. This includes having cold traps, multipleion pumps, turbo pumps, titanium sublimation pumps, sorption pumps,stage heaters, and load locks (Dorothy Hoffman, Handbook of VacuumScience and Technology)

Step 112, illustrates and embodiment of the invention, where the labelednucleic acid on the TEM grid is imaged by electron microscopy. Theinvention should not be construed to be limited to a TEM grid at thisstep and may be any imaging substrate known by those of skill in theart. Also, any suitable electron microscope may be used (e.g., aTitan-80-300, Nion UltraSTEM, or VG 501 electron microscope) preferablywith suitable aberration correctors using HAADF STEM to visualize theposition of the labels.

In step 114, the nucleic acid sequence data is generated and analyzed.According to one embodiment of the invention, FIG. 18 shows a system foranalyzing a nucleic acid sequence, which may include an electronmicroscope 2202, a processor module 2204, at least one memory module2206, an analyzer module 2208, a user interface 2210, and a networkinterface 2212. Electron microscope 2202 may be configured to generatean electronic signal representing electron dense regions. Analyzermodule 2208 is configured to analyze the nucleic acid sequence based onthe electronic signal generated by electron microscope 2202. The atleast one memory module 2206 is adapted to stored at least one of theelectronic signals representative of the nucleic acid sequence generatedby the electron microscope and/or analysis. User interface 2210 isconfigured to allow the user to interact with the analysis. In aparticular embodiment, the at least one memory module includes separatememories for storing the analysis and for storing the electronic signalrepresenting the nucleic acid sequence. In a further embodiment, theanalyzer 2208 and the at least on memory module 1806 are remote from theelectron microscope and connected to the electron microscope through anetwork. One or more of the modules are located in the same device, suchas a computer or processor to perform the various functions.Alternatively, the modules may be separate pieces of structure toperform the various functions.

In a more specific embodiment, data analysis may be collected from acommercially available system or a custom system as shown in FIG. 19.The system in FIG. 19 may include an electronic microscope 1902(commercially available or a custom microscope), a processor detector1904, an image recognition computer 1906, at least one memory module1908, an analyzer module 1910, an user interface 1912 (conceptual viewon screen), and an optional network interface (internet or intranet).The electron microscope may be configured to generate an electronicsignal representing a nucleic acid sequence. The analyzer module 1910 isconfigured to analyze the nucleic acid sequence based on the electronicsignal generated by electron microscope 1902. In some embodiments memoryis a medium selected from hard or floppy disks, optical media, compactdisc (CD), digital versatile disc (DVD), semiconductor media, and flashmemory.

The final sequence information is assembled either manually or,preferably, using an image recognition system. Nucleic acid spacinginformation is generated from the high data output detector which maybe, for example, a CCD detector, CMOS or PMT. An algorithm is employedto determine from the information received from the high data output CCDdetector the spacing of the osmium label, for example, and accordingly,determine the sequence of the specific bases for a given labelingreaction batch. The information received from the CCD is stored on amemory and analyzed. The memory may include e.g., magnetic media such asconventional hard or floppy disks, optical media such as compact disc(CD), digital versatile disc (DVD), or the like, and/or semiconductormedia such as flash memory. Algorithms (computer programs) for sequenceassembly are well known. Programs that may be used or adapted for use inthe invention include, for example, DNA Naser and Cap3, which are themost common sequence assembly software programs used in the art such asthose disclosed by U.S. Pat. No. 6,760,668 entitled “Method forAlignment of DNA Sequences with Enhanced Accuracy and Read Length,” andU.S. Pat. No. 6,988,039 entitled “Method for Determining SequenceAlignment Significance,” the disclosures of which are expresslyincorporated herein by reference in their entirety. See also worldwideweb at dnabaser.com/index.html (Heracle Software, Lilienthal, Germany);and Huang, X., and Madan, A., 9 GENOME RES. 868-877 (1999). The specificparameters of assembly will depend in part on the nature of the label(s)used. In one embodiment, for example, for each strand imaged,information including the relative position for each labeled base andeach unlabeled base is correlated with the specificity of the label(e.g., were all T's labeled, all T's and all C's) to determine thepositional sequence for the labeled bases and the information stored.Information stored for a stand and its deduced complement is compared toinformation generated for other strands and their complements andmatches identified. Contiguous sequences are identified and assembled toproduce a complete sequence.

When the nucleic acid being sequenced is from a previously sequencedgenome (e.g., mouse, human, bacterial, viral) the initial sequence data(e.g., positional sequences within various strands) can be matched tothe known reference sequence to accelerate analysis.

In one aspect the invention comprises analyzing a nucleic acid sequence(A's, T's, G's, and C's) stored in a memory, wherein said sequence wasdetermined by the methods described hereinabove. In one aspect theinvention comprises receiving imaging data (e.g., the positions oflabeled and unlabeled bases), positional sequence (optionally positonalsequence in which at least one base is undetermined) or other sequenceinformation, and processing the data to determine the nucleotidesequence of a nucleic acid sample. Typically the data are received inelectronic form. In one embodiment the nucleic acid sequence is genomicsequence of a human subject. In one embodiment the analyzing comprisesdetermining at least one of the presence or absence of one or moresingle nucleotide polymorphisms, copy number, variants, indels,rearrangements, or whole genome sequences.

A illustrative image of a single stranded DNA strand with Ts labeled isshown in FIG. 19, with a diagram illustrating how the pattern of heavylabels correspond to partial, base-specific sequence information for thearea imaged. Information combining multiple imagings of the sameunderlying sequence with different labels and/or reaction conditionsallows for highly accurate sequence determination.

The backbone of the nucleic acid does not have to be tracked because thestrand is stretched out straight, so that labeling every base with aunique heavy label is not necessary. FIG. 20 illustrates the ambiguityinherent in alternative preparation methods. Using the methods of theinvention, described above and in the specific examples, below, phasingerrors will not be introduced as a missing label will just be “read” asa blank spot.

Multiple labeling patterns can be combined bioinformatically todetermine the underlying nucleic acid sequence. The methods of theinvention provide efficiency of throughput as a result ofsingle-molecule placement control. Due to the arrangement of strands ina highly predictable fashion not only individually (i.e., consistentbase-to-base spacing) but also predictably parallel dense arrays ofsingle strands, image analysis will be highly efficient. The methods ofthe invention also allow the stretching force to be controlled, therebyresulting in optimization of the degree of base-to-base spacing.

Without further elaboration, it is believed that one skilled in the artusing the preceding description can utilize the invention to the fullestextent. The following examples are illustrative only, and not limitingof the disclosure in any way whatsoever.

Examples Example 1

The DNA of interest is isolated from a sample using techniques known tothose of ordinary skill in the art. The DNA of interest is then dividedinto four solutions, i.e., solutions 1, 2, 3, and 4. Each solution isreacted with Os-bipy for different lengths of time and with differentconcentrations of Os-bipy in order to achieve different base-specificlabelling densities.

Solutions 1 and 2 are reacted for 20 hours at 26 degrees Celsius with afour-fold molar excess of Osmium tetroxide and of 2,2′-bipyridine in TEbuffer pH 8.0 with 100 mM Tris and 10 mM EDTA; these conditions labelabout 100% of T's, about 85% of C's, about 7% of G's, and about 0% ofA's. Solutions 3 and 4 are reacted under the same conditions assolutions 1 and 2 except that the reaction only proceeds for 15 minutes,and only a 2.5-fold molar excess of Osmium tetroxide and of2,2′-bipyridine is used; these conditions label about 90% of T's, about8% of C's, about 5% of G's, and about 0% of A's. However, prior to theOs-bipy reaction, Solutions 2 and 4 are first subjected to a bisulfitetreatment to convert unmethylated C residues to U. Bisulfite protocol isknown to those skilled in the art, but a very condensed version of sucha protocol is described here: Add final concentrations of 0.05 mMhydroquinone, 3.3 M sodium bisulfite, and 3 ng/microliter denatured DNAto a centrifuge tube, cap the tube and shield from light with aluminumfoil, incubate at 55 C for 8 hours, purify the DNA by standard methods,and resuspend the DNA in TE buffer (pH 8.0, 100 mM Tris 10 mM EDTA) forsubsequent heavy atom labeling. This allows the pattern of methylationto be determined by comparing sequences from solutions treated withbisulfite to those left untreated. According to ref: Jelen ET AL., 10GEN. PHYS. AND BIOPHYS. at 461. After the labeling reaction, unlabeledosmium is removed by ultrafiltration to minimize extraneous heavy atomcontamination during the imaging process and the DNA polymers arediluted to about 0.1 ng/μl in TE buffer pH 8.

A sharp needle is made by heating a glass fiber in an ethanol flame andpulling to separate thereby resulting in two sharp needles with a radiiof curvature at their ends of less than about 200 nm. One needle is thencoated with PMMA by dipping in a 0.5% solution in acetone and drying inan acetone atmosphere. The needle is then glued on to a holding piece orclamped onto an arm of a positioner such as a piezo actuator (e.g.,programmable AFM silicon cantilever) to control the position and motionof the needle.

The needle is dipped into the DNA polymer solution containing the DNApolymers of interest and then pulled out to extend or “pull out” the DNAstrands into empty space from the DNA polymer solution. The extended DNApolymer strands should remain perpendicular to the surface of the DNApolymer solution.

The DNA polymers are attached to an ultra-flat silicon TEM grid that iscovered with a thin carbon film having a thickness in a range of about1.5 nm to about 5 nm. The silicon TEM grid is made by evaporating carbonon one side of a silicon piece and then etching the back side ensuringthat the carbon film is flat using techniques known to those of ordinaryskill in the microfabrication industry.

The grid is placed next to the droplet of DNA polymer solution so thatthe DNA polymer strand suspended between the droplet and the sharpneedle is allowed to contact the grid before being completely “pulledout” of the DNA polymer solution. The DNA polymer extended into theempty space is only brought into contact with the TEM grid when thestrand is placed on it. Consideration of the angles of the DNA polymersolution with respect to the grid and the motion of the needle is takento ensure this. Using this threading technique, thousands to millions ofstrands can be placed parallel to each other on the TEM grid.

The DNA polymer solution is pulled away from the spanning position usinga pipette tip or micropositioners moving the base that the droplet restson. About 0.5 nm to about 6 nm of carbon is then evaporated to stabilizethe DNA polymer. The spanning is done in a controlled environmentchamber to minimize evaporation and dust.

Imaging is performed in a commercially available Titan 80-300 (FEICompany, Hillsboro, Oreg.) with aberration correctors in Z-contrast STEMmode or other suitable high resolution electron microscope. Theinformation from the detector is used to determine the spacing of theosmium labels and accordingly, the sequence of the DNA of interest.

Example 2

The same methodology is carried out in the same manner as in SpecificExample 1, above, with the exception that the DNA polymer is shelfspanned down on a piece of PDMS as shown in FIG. 17. The PDMS is thenset in contact with a carbon film on mica so that the DNA polymer istransferred to the carbon. The PDMS piece is then pulled away, leavingthe DNA polymer behind. About 1 nm to about 5 nm of carbon is evaporatedon top of the DNA polymer on the carbon film on the mica. The carbonfilm is floated onto water and picked up with a TEM grid. The DNApolymer is then imaged as described in Specific Example 1 or SpecificExample 2, to determine the sequence information.

The examples given above are merely illustrative and are not meant to bean exhaustive list of all possible embodiments, applications ormodifications of the invention. Thus, various modifications andvariations of the described methods and systems of the invention will beapparent to those skilled in the art without departing from the scopeand spirit of the invention. Although the invention has been describedin connection with specific embodiments, it should be understood thatthe invention as claimed should not be unduly limited to such specificembodiments. Indeed, various modifications of the described modes forcarrying out the invention which are obvious to those skilled inmolecular biology, immunology, chemistry, biochemistry or in therelevant fields are intended to be within the scope of the appendedclaims.

The disclosures of all references and publications cited above areexpressly incorporated by reference in their entireties to the sameextent as if each were incorporated by reference individually.

1-93. (canceled)
 94. A method of determining the sequence of 20 or moreconsecutive bases in a nucleic acid, the method comprising: (a)providing a nucleic acid of 20 or more bases in length; (b) imaging thenucleic acid by electron microscopy to obtain electron microscopy (EM)data; and (c) determining the sequence of the nucleic acid from the EMdata.
 95. The method according to claim 94, wherein the methoddetermines the sequence of 50 or more consecutive bases in the nucleicacid.
 96. The method according to claim 95, wherein the methoddetermines the sequence of 1,000 or more consecutive bases in thenucleic acid.
 97. The method according to claim 95, wherein the methoddetermines the sequence of 10,000 or more consecutive bases in thenucleic acid.
 98. The method according to claim 95, wherein the methoddetermines the sequence of 100,000 or more consecutive bases in thenucleic acid.
 99. The method according to claim 95, wherein the methoddetermines the sequence of 1,000,000 or more consecutive bases in thenucleic acid.
 100. The method according to claim 94, wherein the nucleicacid sequence is obtained at a rate of at least 1,000 bases per second.101. The method according to claim 94, wherein the EM data is obtainedby transmission electron microscopy.
 102. The method according to claim100, wherein the imaging comprises imaging 10,000 or more bases persecond
 103. The method according to claim 94, wherein the nucleic acidhas a pre-determined configuration.
 104. The method according to claim103, wherein the nucleic acid has a linear configuration with consistentbase-to-base spacing.
 105. The method according to claim 104, whereinthe base-to-base spacing ranges from 3 to 7 Å.
 106. The method accordingto claim 105, wherein the providing step comprises: a) introducing anucleic acid binding tool into a fluid composition of the nucleic acidso that the nucleic acid binds to the nucleic acid binding tool; b)removing the nucleic acid binding tool from the fluid so that thenucleic acid is stretched into space between an air/fluid interface andthe nucleic acid binding tool; and c) depositing the stretched nucleicacid onto a substrate.
 107. The method according to claim 106, whereinthe stretched nucleic acid is deposited onto the substrate using a shelfthreading protocol.
 108. The method according to claim 106, wherein thestretched nucleic acid is deposited onto the substrate using a gapthreading protocol.
 109. The method according to claim 94, wherein thenucleic acid is a contrast agent labeled nucleic acid.
 110. The methodaccording to claim 109, further comprising producing the contrast agentlabeled nucleic acid by contacting the nucleic acid to be sequenced witha contrast agent that directly labels the nucleic acid to be sequencedto produce the contrast agent labeled nucleic acid.
 111. The methodaccording to claim 110, wherein the method comprises producing two ormore different populations of differentially labeled nucleic acidsequences, wherein each of the different populations comprise nucleicacids of identical sequence.
 112. The method according to 111, whereinthe two or more different populations are labeled with the same label.113. The method according to claim 110, wherein only a portion of thebases are labeled in the contrast agent labeled nucleic acid.
 114. Themethod according to claim 110, wherein the contrast agent labelednucleic acid is a labeled with a high-Z atom contrast agent.